Haplotype inference for present-absent genotype data using previously identified haplotypes and haplotype patterns
نویسندگان
چکیده
MOTIVATION Killer immunoglobulin-like receptor (KIR) genes vary considerably in their presence or absence on a specific regional haplotype. Because presence or absence of these genes is largely detected using locus-specific genotyping technology, the distinction between homozygosity and hemizygosity is often ambiguous. The performance of methods for haplotype inference (e.g. PL-EM, PHASE) for KIR genes may be compromised due to the large portion of ambiguous data. At the same time, many haplotypes or partial haplotype patterns have been previously identified and can be incorporated to facilitate haplotype inference for unphased genotype data. To accommodate the increased ambiguity of present-absent genotyping of KIR genes, we developed a hybrid approach combining a greedy algorithm with the Expectation-Maximization (EM) method for haplotype inference based on previously identified haplotypes and haplotype patterns. RESULTS We implemented this algorithm in a software package named HAPLO-IHP (Haplotype inference using identified haplotype patterns) and compared its performance with that of HAPLORE and PHASE on simulated KIR genotypes. We compared five measures in order to evaluate the reliability of haplotype assignments and the accuracy in estimating haplotype frequency. Our method outperformed the two existing techniques by all five measures when either 60% or 25% of previously identified haplotypes were incorporated into the analyses. AVAILABILITY The HAPLO-IHP is available at http://www.soph.uab.edu/Statgenetics/People/KZhang/HAPLO-IHP/index.html. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
منابع مشابه
A hidden Markov model for haplotype inference for present-absent data of clustered genes using identified haplotypes and haplotype patterns
The majority of killer cell immunoglobin-like receptor (KIR) genes are detected as either present or absent using locus-specific genotyping technology. Ambiguity arises from the presence of a specific KIR gene since the exact copy number (one or two) of that gene is unknown. Therefore, haplotype inference for these genes is becoming more challenging due to such large portion of missing informat...
متن کاملHaplotype Block Partitioning and tagSNP Selection under the Perfect Phylogeny Model
Single Nucleotide Polymorphisms (SNPs) are the most usual form of polymorphism in human genome.Analyses of genetic variations have revealed that individual genomes share common SNP-haplotypes. Theparticular pattern of these common variations forms a block-like structure on human genome. In this work,we develop a new method based on the Perfect Phylogeny Model to identify haplo...
متن کاملAssociation of P53 (+16ins-Arg) Haplotype with the Increased Susceptibility to Breast Cancer in Iranian-Azeri Women
Background:Many case-control investigations have showed the correlation of TP53 gene polymorphisms with the risk of breast cancer. However, the findings are not consistent. It has been suggested that the investigation of P53 genotype combinations and haplotypes may be more helpful than the detection of single polymorphisms. In the present study, we investigate...
متن کاملHaplotype block partitioning and tag SNP selection using genotype data and their applications to association studies.
Recent studies have revealed that linkage disequilibrium (LD) patterns vary across the human genome with some regions of high LD interspersed by regions of low LD. A small fraction of SNPs (tag SNPs) is sufficient to capture most of the haplotype structure of the human genome. In this paper, we develop a method to partition haplotypes into blocks and to identify tag SNPs based on genotype data ...
متن کاملLeveraging reads that span multiple single nucleotide polymorphisms for haplotype inference from sequencing data
MOTIVATION Haplotypes, defined as the sequence of alleles on one chromosome, are crucial for many genetic analyses. As experimental determination of haplotypes is extremely expensive, haplotypes are traditionally inferred using computational approaches from genotype data, i.e. the mixture of the genetic information from both haplotypes. Best performing approaches for haplotype inference rely on...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 23 18 شماره
صفحات -
تاریخ انتشار 2007